Learning Phone Embeddings for Word Segmentation of Child-Directed Speech
نویسندگان
چکیده
This paper presents a novel model that learns and exploits embeddings of phone ngrams for word segmentation in child language acquisition. Embedding-based models are evaluated on a phonemically transcribed corpus of child-directed speech, in comparison with their symbolic counterparts using the common learning framework and features. Results show that learning embeddings significantly improves performance. We make use of extensive visualization to understand what the model has learned. We show that the learned embeddings are informative for both word segmentation and phonology in general.
منابع مشابه
Statistical Speech Segmentation and Word Learning in Parallel: Scaffolding from Child-Directed Speech
In order to acquire their native languages, children must learn richly structured systems with regularities at multiple levels. While structure at different levels could be learned serially, e.g., speech segmentation coming before word-object mapping, redundancies across levels make parallel learning more efficient. For instance, a series of syllables is likely to be a word not only because of ...
متن کاملFinding the gaps: applying a connectionist model of word segmentation to noisy phone-recognized speech data
The Christiansen model of word segmentation [1] is a connectionist framework for modeling how infants combine multiple cues in learning and processing language. Most studies applying this model assume idealized input with adult-like representations of phonemes and features, with little or no degradation of the input signal. From these studies, it is difficult to tell if the model is robust to n...
متن کاملA statistical model for word discovery in child directed speech
A statistical model for segmentation and word discovery in child directed speech is presented. An incremental unsupervised learning algorithm to infer word boundaries based on this model is described and results of empirical tests showing that the algorithm is competitive with other models that have been used for similar tasks are also presented.
متن کاملMAP Lexicon is Useful for Segmentation and Word Discovery in Child Directed Speech
An efficient algorithm for segmenting child-directed speech into words has recently been proposed in the Machine Learning journal. This short technical note proposes some modifications to this algorithm. In particular, a slightly more conservative variation of the original approach is proposed that infers word boundaries based simply on the maximum a-posteriori lexicon. Results of empirical tes...
متن کاملLearning Words and Their Meanings from Unsegmented Child-directed Speech
Most work on language acquisition treats word segmentation—the identification of linguistic segments from continuous speech— and word learning—the mapping of those segments to meanings—as separate problems. These two abilities develop in parallel, however, raising the question of whether they might interact. To explore the question, we present a new Bayesian segmentation model that incorporates...
متن کامل